---
title: Overlap Refinery
description: Refine chunks by adding overlapping context from adjacent chunks.
icon: "arrows-left-right-to-line"
iconType: "solid"
---

The `OverlapRefinery` enhances chunks by incorporating context from neighboring chunks. This is useful for tasks where maintaining contextual continuity between chunks is important, such as question answering or summarization over long documents. It can add context as a prefix (from the preceding chunk) or a suffix (from the next chunk).

## API Reference

To use the `OverlapRefinery` via the API, check out the [API reference documentation](../../api/refineries/overlap).

## Initialization

To use the `OverlapRefinery`, initialize it with the desired parameters. You can specify a tokenizer, context size, overlap mode, method, and other options.

```python
from chonkie import OverlapRefinery

# Initialize with default character-level overlap (25% context size)
overlap_refinery = OverlapRefinery()

# Initialize with a specific tokenizer and context size
overlap_refinery_token = OverlapRefinery(
    tokenizer="character",             # Default tokenizer (or use "gpt2", etc.)
    context_size=0.25,                 # The size of the context to add to the chunks.
    method="prefix",                   # Add context from the previous chunk
    merge=True                         # Merge context directly into chunk text
)

# Initialize for recursive overlap based on rules
from chonkie import RecursiveRules, RecursiveLevel
rules = RecursiveRules(
    levels=[
        RecursiveLevel(delimiters=["\n\n"], include_delim="prev"),
        RecursiveLevel(delimiters=["."], include_delim="prev"),
        RecursiveLevel(whitespace=True)
    ]
)
overlap_refinery_recursive = OverlapRefinery(
    tokenizer="character",
    context_size=0.25,
    mode="recursive",
    rules=rules,
    method="suffix"
)
```

## Usage

Use the `OverlapRefinery` object as a callable or use the `refine` method to add overlapping context to your chunks.

```python
from chonkie import TokenChunker, OverlapRefinery

test_string = "This is the first sentence. This is the second sentence, providing context. This is the third sentence, which needs context from the second."
chunker = TokenChunker()
chunks = chunker(test_string)

# Initialize refinery to add suffix overlap
overlap_refinery = OverlapRefinery(
    tokenizer="character",
    context_size=0.5,
    method="suffix",
    merge=True
)

refined_chunks = overlap_refinery(chunks)
```

## Parameters

<ParamField
  path="tokenizer"
  type="Union[str, Callable, Any]"
  default='"character"'
>
  The tokenizer to use for calculating overlap size. Can be a
  string identifier (e.g., "character", "word", "gpt2"), a callable, or a
  `chonkie.Tokenizer` instance. Defaults to "character".
</ParamField>

<ParamField path="context_size" type="Union[int, float]" default="0.25">
  The size of the overlap context. If an `int`, it's the absolute number of
  tokens. If a `float` (between 0 and 1), it's the fraction of the maximum chunk
  token count.
</ParamField>

<ParamField path="mode" type='Literal["token", "recursive"]' default='"token"'>
  The mode for calculating overlap. `"token"` uses the tokenizer directly.
  `"recursive"` uses hierarchical splitting based on `rules`.
</ParamField>

<ParamField path="method" type='Literal["suffix", "prefix"]' default='"suffix"'>
  The method for adding context. `"suffix"` adds context from the *next* chunk
  to the end of the current chunk. `"prefix"` adds context from the *previous*
  chunk to the beginning of the current chunk.
</ParamField>

<ParamField path="rules" type="RecursiveRules" default="RecursiveRules()">
  The rules used for splitting text when `mode` is `"recursive"`. Defines
  delimiters and behavior at different hierarchical levels. See
  `chonkie.types.RecursiveRules`.
</ParamField>

<ParamField path="merge" type="bool" default="True">
  If `True`, the calculated context is directly prepended (for `prefix`) or
  appended (for `suffix`) to the `chunk.text`. If `False`, the context is stored
  in `chunk.context` attribute without modifying `chunk.text`.
</ParamField>

<ParamField path="inplace" type="bool" default="True">
  If `True`, modifies the input list of chunks directly. If `False`, returns a
  new list of modified chunks.
</ParamField>
